-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/adjoint odes #1905
Feature/adjoint odes #1905
Conversation
…eparate branch for adjoint ODEs)" This reverts commit 962ba83.
There are some questions to answer and things to do for this:
a. Tolerances for the forward solve These can be parameterized in absolute and relative tolerances, or the absolute tolerances of the backwards stuff can be parameterized by a vector. We gotta decide what to expose.
|
Basically the outline of the code is section 6.1 here: https://computing.llnl.gov/sites/default/files/public/cvs_guide.pdf The set of backwards ODEs is something like:
The way to think about this is we're actually solving a single adjoint program between each output. So integrating from
step_sens there).
The adjoints from the output look like a forcing function that is a series of delta functions at times |
Oh yeah in that notation |
After ages I finally found some time to look into this here. From what I see it looks as if we are close to getting this working, which should/would be great. A few Qs:
Looks like cool beans. @rok-cesnovar Would a new |
That's fixed I believe.
Not on this code. Once the timing stuff gets in it's gonna be waaaay easier to start talking about ODE performance: stan-dev/cmdstan#960 (comment) Hopefully that'll be within the month.
It's pretty confusing to track down. I think @charlesm93 has it written up somewhere. I know Charles also has some thoughts related to the adjoint problem as well. Yeah a separate function makes sense for this as well as some benchmarks at the Stan model level. It will be much easier to get this info if we wait on timing -- that gives us forward and reverse mode timings and it's easy to benchmark the different algos. I forget if this was passing all the existing tests, but it'll probably need a little dust knocked off to get up to date with develop. You wanna take over here? |
I'm not sure how helpful this is, but @betanalpha and I have a writeup about the adjoint method, but we only cover the math that motivates what the adjoint system should be, not the implementation details (checkpointing, etc). Section 1 in https://arxiv.org/abs/2002.00326 is our attempt at explaining this in an intelligible manner. We have some work in progress with generalizations for other implicit functions, but this is taking time to write. |
@bbbales2 Sure, I can take this up - I may need some help here and there which would be great. One thing I wonder is if we can reduce the number of quadrature problems to just what we need. I mean, right now we output for every state the derivatives for every parameter. For N ode states and P parameters this makes up N x P quadrature problems per time point. If we restrict to just one output (ideally the lpdf value or for now only one ode state needed), then that should be cheaper. @charlesm93 Skimming through your abstract it reads really similar to https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005331 (see supplements section 1). It's a bit more specialized though. Anyway, it's good to have another take on this, I will dive into yours as well. So the next steps for me are:
Then we are a few steps further. |
@wds15 Thanks for sharing the paper. The supplement section 1 is similar to our section 1, as both review the classic adjoint method for ODEs. While the authors discuss discrete times, my understanding is that the system is still assumed to evolve continuously between these times. Ultimately, equations 1, 9, 10, and 11 all describe ODEs / integrals to evaluate and differentiate the objective function. The discretization my co-authors and I investigated concerns difference equations (initially motivated by HMMs). There's no ODE and the sensitivities are not given by an integral but by a summation. I think the term "discrete" is ambiguous and maybe a little misleading in both papers. |
@charlesm93 yup, things are confusing indeed. Your paper though is relatively verbose and straightforward to follow which is great. @bbbales2 I just push a feature/adjoint-odes-v2 branch which is up to date with develop and I renamed the new stuff to ode_bdf_adjoint. The bad news is that a few things do not work right now:
I am using the test |
Sorry, missed this. Adding a new variadic signature is just a bit more work then adding a simple signature which is trivial. Once you figure out what you need do tag me and I can prepare that, really not a problem. |
The most challenging aspect of implementing adjoint ODE methods in general is the checkpointing and backwards interpolation, which we should definitely defer to the CVODES functionality. That leaves integrating the CVODES functions — CVODES does provide a sequence of functions for the forward solve, setting up the backwards system, and then the backwards solve — into the autodiff. Because we don’t know the adjoint until the reverse pass we have to either store the checkpointing information during the forward pass or re-solve the forward system during the reverse pass. The latter incurs a higher overhead, but the performance scaling with the number of parameters will still be better and it could be a helpful start to help motivate the best organization.
… On Jan 2, 2021, at 1:55 PM, wds15 ***@***.***> wrote:
@charlesm93 <https://github.com/charlesm93> yup, things are confusing indeed. Your paper though is relatively verbose and straightforward to follow which is great.
@bbbales2 <https://github.com/bbbales2> I just push a feature/adjoint-odes-v2 branch which is up to date with develop and I renamed the new stuff to ode_bdf_adjoint. The bad news is that a few things do not work right now:
Eigen data structures cannot anymore be part of the arguments due to changes in value_of related to expressions, I think. I don't have an idea what to change... but maybe it's an easy change to do?
The thing spits out wrong results whenever the parameters vary. I don't know why that is the case right now. @bbbales2 <https://github.com/bbbales2> are you aware of any conventions being different? I will need to thoroughly debug and solve this before I can continue.
I am using the test test/unit/math/rev/functor/ode_bdf_adjoint_grad_rev_test.cpp to test for correctness. This test passes ok on the old branch feature/adjoint-odes; so it's something due to the merge with develop. No idea right now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1905 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALU3FWWQ2VAEH5AAXHLCVDSX5T2FANCNFSM4NIC4OYA>.
|
I will not implement this stuff for RK45 - just to make this clear from the start. If anyone makes this is a requirement for including this, then I will stop working now on this. @betanalpha We are using CVODES callbacks and all the facilities they provide. @bbbales2 Good news! I looked closely at what changed in cvodes_integrator in the meantime and found that some stuff changed wrt to Eigen expression stuff. By using For now I can continue as unit tests are ok with the
|
Yeah this came up before and @yizhang-yiz said implementing checkpointing wasn't a business we wanted to get in to.
Will give this a go later |
This test seems to be missing:
|
I branched off to a v2 version of yours. The file you are looking for is here: |
That built for me and the checks pass, though I got tons of sanitzer warnings about memory leaks (though that might just be the tests not doing a Ubuntu 18.04, Clang 6.0.0 |
Oh also feel free to just merge that branch in here and work here. Or we can close this and move to another pull. Your working branch should be the main thing since nobody else is directly working on this right now. |
Sorry, you need to uncomment tests which use eigen things. I did not want to leave this not compiling. Sure, we can merge v2 into this one and keep going here. Sorry for the confusion. |
My hopes dashed on the rocks |
… into feature/adjoint-odes
…4.1 (tags/RELEASE_600/final)
… into feature/adjoint-odes
@SteveBronder @bbbales2 @rok-cesnovar I think I solved the problem which we had with Once tests pass I think that this can be merged. @rok-cesnovar any idea why not all tests are being fired off? Or do they take a moment to pop up? |
Github seems to be a bit slow spawning the instances. Will monitor but this is typically resolved in a matter of a few hours. |
There seems to be an outage with Github actions: https://www.githubstatus.com/incidents/zbpwygxwb3gw stanc3 now properly errors if tolerances are not data. I started the build (https://github.com/stan-dev/cmdstan/actions/runs/846697281) but it needs the outage to be resolved. I also opened the stanc3 PR, will start begging for reviews tomorrow: stan-dev/stanc3#900 |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
„All checks good“? Should we merge or fire off the other github actions somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some minor comments. I tested the latest branch in cmdstan and the array problem seems to be gone.
This is good to approve/merge. I don't have permissions to do that since I made this PR.
stan::math::zero_adjoints(a, b, va, vb, c, d, e, vva, vvb, vc, vd, ve); | ||
stan::math::for_each( | ||
[](auto&& x) { stan::math::zero_adjoints(x); }, | ||
std::forward_as_tuple(a, b, va, vb, c, d, e, vva, vvb, vc, vd, ve)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this foreach. The previous code was testing that the variadic thing worked with all types. Since we aren't doing the variadic thing here, no need for this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is change by @SteveBronder (I think)... and this is exactly how we test this thing on develop right now. I recall that he said that the for_each thing is much easier on the compiler due to less nesting and it's potentially even faster. I leave it as is if you don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I mean this test for zero adjoints is not necessary, not the for_each
itself. This for_each is replacing:
stan::math::zero_adjoints(a, b, va, vb, c, d, e, vva, vvb, vc, vd, ve);
which is just not a test we need anymore. I definitely don't think we should remove the for_each
function -- just this specific zero_adjoints
test seemed extra.
Let me know when the docs are up and I'll review those. |
Will approve once these few last comments are resolved. The Github Actions stuff has been resolved. |
…4.1 (tags/RELEASE_600/final)
Done. Found some not quite up-to-date doxygen comments which I addressed as well. All good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving assuming @SteveBronder agrees with the comment here: #1905 (comment)
I changed things to what @bbbales2 suggested, but that leads to a compiler error. This is because @SteveBronder changed things from a recursion design to a loop over tuple thing. At least this is how I understood.... if @SteveBronder disagrees, then there is a few more hours for this... Thanks for approving @rok-cesnovar ! |
I don't think the change I suggested should lead to compiler error -- was just a tiny cleanup of a specific test (delete 3 lines that should work but aren't really testing anything). Tried to clarify here. Edit: But also that specific change is definitely optional -- just noticed it as I was scrolling through the code this morning |
Is there a shortage on Windows testing nodes? @serban-nicusor-toptal any idea? |
Windows tests are skipped on PR (we run them with Github Actions anyway). Its waiting for GPU instances (the Columbia one is down). @serban-nicusor-toptal Maybe we up the restriction for these a bit for a few days around the release? |
Hey, sorry for the inconvenience. I've set the GPU machines to be more available so you don't need to wait that much sometimes. |
Looks like GPU tests have passed and now we are onto the upstream tests. Thanks! |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Good to go? |
Yeah, it all looks good to me |
I think so yes... so I hit merge???!!!! |
Hey sorry was apartment hunting all weekend. This all seems good! |
This PR implements the design as described in
https://github.com/stan-dev/design-docs/blob/master/designs/0027-adjoint-ode.md
The goal is to make reverse mode autodiff faster for large ODE systems.
Checklist
Math issue implement adjoint ODE solver #2486
Copyright holder: Columbia University, Sebastian Weber
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested